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Abstract. We describe an algorithm for deciding the first-order multisorted theory BAPA, 
which combines 1) Boolean algebras of sets of uninterpreted elements (BA) and 2) Pres- 
burger arithmetic operations (PA). BAPA can express the relationship between integer vari- 
ables and cardinalities of sets, and supports arbitrary quantification over both sets and inte- 
gers. 

Our motivation for BAPA is deciding verification conditions that arise in the static analysis 
of data structure consistency properties. Data structures often use an integer variable to keep 
track of the number of elements they store; an invariant of such a data structure is that the 
value of the integer variable is equal to the number of elements stored in the data structure. 
When the data structure content is represented by a set, the resulting constraints can be cap- 
tured in BAPA. BAPA formulas with quantifier alternations arise when annotations contain 
quantifiers themselves, or when proving simulation relation conditions for refinement and 
equivalence of program fragments. Furthermore, BAPA constraints can be used to extend 
the techniques for proving the termination of integer programs to programs that manipulate 
data structures, and have applications in constraint databases. 

We give a formal description of a decision procedure for BAPA, which implies the decid- 
ability of the satisfiability and validity problems for BAPA. We analyze our algorithm and 
obtain an elementary upper bound on the running time, thereby giving the first complexity 
bound for BAPA. Because it works by a reduction to PA, our algorithm yields the decidabil- 
ity of a combination of sets of uninterpreted elements with any decidable extension of PA. 
Our algorithm can also be used to yield a space-optimal decision procedure for BA though 
a reduction to PA with bounded quantifiers. 

We have implemented our algorithm and used it to discharge verification conditions in the 
Jahob system for data structure consistency checking of Java programs; our experience with 
the algorithm is promising. 


1 Introduction 


Program analysis and verification tools can greatly contribute to software reliability, es- 
pecially when used throughout the software development process. Such tools are even 
more valuable if their behavior is predictable, if they can be applied to partial programs, 
and if they allow the developer to communicate the design information in the form of 
specifications. Combining the basic idea of [22,28] with decidable logics leads to anal- 
ysis tools that have these desirable properties. Such analyses are precise (because for- 
mulas represent loop-free code precisely) and predictable (because the checking of ver- 
ification conditions terminates either with a realizable counterexample or with a sound 
claim that there are no counterexamples). 

A key challenge in this approach to program analysis and verification is to identify 
a logic that captures an interesting class of program properties, but is nevertheless de- 
cidable. In [41-43, 80] we identify the first-order theory of Boolean algebras (BA) as a 
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useful language for reasoning about dynamically allocated objects: BA allows express- 
ing generalized typestate properties and reasoning about data structures as dynamically 
changing sets of objects. BA is known to be decidable [45, 67]. 

The motivation for this paper is the fact that we often need to reason not only about 
the data structure content, but also about the size of the data structure. For example, we 
may want to express the fact that the number of elements stored in a data structure is 
equal to the value of an integer variable that is used to cache the data structure size, or 
we may want to introduce a decreasing integer measure on the data structure to show 
program termination. These considerations lead to a natural generalization of the first- 
order theory of BA of sets, a generalization that allows integer variables in addition 
to set variables, and allows stating relations of the form |A| = k meaning that the 
cardinality of the set A is equal to the value of the integer variable k. Once we have 
integer variables, a natural question arises: which relations and operations on integers 
should we allow? It turns out that, using only the BA operations and the cardinality 
operator, we can already define all operations of PA. This leads to the structure BAPA, 
which properly generalizes both BA and PA. 

As we explain in Section 2, a version of BAPA was shown decidable already in [19] 
(which also proves the well-known Feferman-Vaught theorem [29, Section 9.6] about 
the products of first-order theories). Recently, a decision procedure for a fragment of 
BAPA without quantification over sets was presented in [79], cast as a multi-sorted the- 
ory. Starting from [43] as our motivation, we have observed in [38] the decidability of 
the full BAPA (which was initially left open in [79]). After our report [38], an algorithm 
for a language between BA and BAPA was presented in [62] as a way of evaluating 
queries in constraint databases. The constraints in [62] allow only constant integer pa- 
rameters and not integer variables; moreover, [62] still leaves open the complexity of 
the algorithm. 

Our paper gives the first formal description of a decision procedure for the full 
first-order theory of BAPA. Furthermore, we analyze our decision procedure and show 
that it yields an elementary upper bound on the complexity of BAPA. Our result is 
the first upper complexity bound on BAPA; along with a lower bound from PA, we 
obtain a good estimate of BAPA worst-case complexity. We have also implemented our 
decision procedure; we report on our initial experience in using the decision procedure 
in the context of a system for checking data structure consistency. 


Contributions. We summarize the contributions of our paper as follows. 


1. As a motivation for BAPA, we show in Section 3 how BAPA constraints can be 
used for program analysis and verification by expressing 1) data structure invari- 
ants, 2) the correctness of procedures with respect to their specifications, 3) simu- 
lation relations between program fragments, and 4) termination conditions for pro- 
grams that manipulate data structures. 


2. We present an algorithm a (Section 4) that translates BAPA sentences into PA 
sentences by translating set quantifiers into integer quantifiers. The algorithm is 
surprisingly simple (the entire source code is included in the Appendix, Section 12) 
and shows a deep connection between BA and PA. 


3. We analyze our algorithm a and show that it yields an elementary upper bound on 
the worst-case complexity of the validity problem for BAPA sentences that is close 


to the bound on PA sentences themselves (Section 5). This is the first complexity 

bound for BAPA, and is the main contribution of this paper. 

4. We discuss our experience in using our implementation of BAPA to discharge 

verification conditions generated in the Jahob verification system [34]. 

5. In addition, we note the following related complexity, decidability and undecidabil- 
ity results: 

(a) We show that PA sentences generated by translating pure BA sentences can be 
checked for validity in singly exponential space, which is a good bound in the 
light of alternating exponential lower bound for BA (Section 5.2). 

(b) We show how to extend our algorithm to infinite sets and predicates for distin- 
guishing finite and infinite sets (Section 10). 

(c) We examine the relationship of our results to the monadic second-order logic 
(MSOL) of strings (Section 11). In contrast to the undecidability of MSOL with 
equicardinality operator (Section 11.2), we identify a combination of MSOL 
over trees with BA that is decidable. This result follows from the fact that our 
algorithm a enables adding BA operations to any extension of PA, including 
decidable extensions such as MSOL over strings (Section 11.1). 


A preliminary version of our results, including the algorithm and complexity analysis 
appear in [38], which also contains some background on quantifier elimination. 


2 The First-Order Theory BAPA 


Figure 3 presents the syntax of Boolean Algebra with Presburger Arithmetic (BAPA), 
which is the focus of this paper. We next present some justification for the operations in 
Figure 3. Our initial motivation for BAPA was the use of BA to reason about data struc- 
tures in terms of sets [40]. Our language for BA (Figure 1) allows cardinality constraints 
of the form |A| = C where Cis a constant integer. Such constant cardinality constraints 
are useful and enable quantifier elimination for the resulting language [45,67]. However, 
they do not allow stating constraints such as | A| = |B| for two sets A and B, and cannot 
represent constraints on changing program variables. Consider therefore the equicardi- 
nality relation eqcard(A, B) that holds iff |A| = |B], and consider BA extended with re- 
lation eqcard(A, B). Define the ternary relation plus(A, B,C) <=> (\A| =|B/+|C)) 
by the formula 3271. Sr. 71 N22 =) A C = 21 Une A eqcard(A, x1) Aeqcard(B, x2). 
The relation plus(A, B,C) allows us to express addition using arbitrary sets as rep- 
resentatives for natural numbers. Moreover, we can represent integers as equivalence 
classes of pairs of natural numbers under the equivalence relation (x, y) ~ (u,v) <=> 
x+v=u+y. This construction allows us to express the unary predicate of being non- 
negative. The quantification over pairs of sets represents quantification over integers, 
and quantification over integers with the addition operation and the predicate “being 
non-negative” can express all PA operations, presented in Figure 2. Therefore, a natural 
closure under definable operations leads to our formulation of the language BAPA in 
Figure 3, which contains both sets and integers. 

The argument above also explains why we attribute the decidability of BAPA to [19, 
Section 8], which showed the decidability of BA over sets extended with the equicardi- 
nality relation eqcard, using the decidability of the first-order theory of the addition of 
cardinal numbers. 
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Fig. 3. Formulas of Boolean Algebras with Presburger Arithmetic (BAPA) 


The language BAPA has two kinds of quantifiers: quantifiers over integers and quan- 
tifiers over sets; we distinguish between these two kinds by denoting integer variables 
with symbols such as k,/ and set variables with symbols such as x,y. We use the 
shorthand 4+k.F(k) to denote 3k.k > 0 A F(k) and, similarly Vk.F'(k) to denote 
Vk.k > 0 = F(k). In summary, the language of BAPA in Figure 3: 1) subsumes the 
language of PA in Figure 2; 2) subsumes the language of BA in Figure 3; and 3) con- 
tains non-trivial combination of these two languages in the form of using the cardinality 
of a set expression as an integer value. 

The semantics of operations in Figure 3 is the expected one. We interpret integer 
operations in standard way, and interpret sets in boolean algebra over subsets of a fi- 
nite sets. The MAXC constant denotes the size of the finite universe U/, so we require 
MAXC = |U/| in all models. (Our results also extend to infinite sets, see Section 10 for 
the discussion.) 


3 Applications of BAPA 


This section illustrates the importance of BAPA constraints. Section 3.1 shows the uses 
of BAPA constraints to express and verify data structure invariants as well as procedure 
preconditions and postconditions. Section 3.2 shows how a class of simulation relation 
conditions can be proved automatically using a decision procedure for BAPA. Finally, 
section 3.3 shows how BAPA can be used to express and prove termination conditions 
for a class of programs. 


3.1 Verifying Data Structure Consistency 


Figure 4 presents a procedure insert in a language that directly manipulates sets. Such 
languages can either be directly executed [18,66] or can be derived from executable 
programs using an abstraction process [41,43]. The program in Figure 4 manipulates 
a global set of objects content and an integer field size. The program maintains an 
invariant J that the size of the set content is equal to the value of the variable size. 
The insert procedure inserts an element e into the set and correspondingly updates the 
integer variable. The requires clause (precondition) of the insert procedure is that the 
parameter e is a non-null reference to an object that is not stored in the set content. 
The ensures clause (postcondition) of the procedure is that the size variable after the 
insertion is positive. Note that we represent references to objects (such as the procedure 
parameter e) as sets with at most one element. An empty set represents a null reference; 
a singleton set {o} represents a reference to object 0. The value of a variable after 
procedure execution is indicated by marking the variable name with a prime. 


var content : set; 
var size : integer; 
invariant [ <=> (size = |content|); 


procedure insert(e : element) 
maintains [ 

requires |e| = 1 A |eM content| = 0 
ensures size’ > 0 


{ 


{lel = 1A |eM content| = 0A size = |content| } 


content := content U e; content := content Ue; size := size + 1; 
size := size + 1; 
} { size’ > OAsize’ = |content’|} 


Fig. 4. An Example Procedure Fig. 5. Hoare Triple for insert Procedure 
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Fig. 6. Verification Condition for Figure 5 


In addition to the explicit requires and ensures clauses, the insert procedure main- 
tains an invariant, [, which captures the relationship between the size of the set content 
and the integer variable size. The invariant J is implicitly conjoined with the requires 
and the ensures clause of the procedure. The Hoare triple [28] in Figure 5 summarizes 
the resulting correctness condition for the insert procedure. 

Figure 6 presents a verification condition corresponding to the Hoare triple in Fig- 
ure 5. Note that the verification condition contains both set and integer variables, con- 
tains quantification over these variables, and relates the sizes of sets to the values of 
integer variables. Our small example leads to a particularly simple formula; in general, 
formulas that arise in the compositional analysis of set programs with integer variables 
may contain alternations of existential and universal variables over both integers and 


sets. This paper shows the decidability of such formulas and presents the complexity of 
the decision procedure. 


3.2 Proving Simulation Relation Conditions 


Another example of where BAPA constraints are useful is when proving that a given re- 
lation on states is a simulation relation between two program fragments. Figure 7 shows 
one such example. The concrete procedure start1 manipulates two sets: a set of running 
processes and a set of suspended processes in a process scheduler. The procedure start1 
inserts a new process into the set of running processes, unless there are already too many 
running processes. The procedure start2 is a version of the procedure that operates in 
a more abstract state space: it maintains only the union of all processes and a num- 
ber of running processes. Figure 7 shows a forward simulation relation r between the 
transition relations for start1 and start2. The standard simulation relation diagram con- 
dition [46] is Vs1.Vs4.Vs9.(t1 (51, $1) Ar(s1, $2)) => Js. (ta(s2, 84) Ar(s2, 85)). In the 
presence of preconditions, t)(s1, 5.) = (pre, (si) = post, (s1, s4)) and to(s2, 84) = 
(pres(s2) = post.(s2, s4)), and sufficient conditions for simulation relation are: 


1. Vs1.Vs2.r(s1, $2) A pres(s2) => pre, (si) 
2. ¥s1.Vs.Vs2.485. r(si, 82) A post, (s1, 51) A preg(s2) = posts (s2, 82) A r(s2, $2) 


Figure 7 shows BAPA formulas that correspond to the simulation relation conditions in 
this example. Note that the second BAPA formula has a quantifier alternation, which 
illustrates the relevance of quantifiers in BAPA. 


var P : set; 


var R: set; Sante nee 
% ? 


var S : set; 


procedure start2(z) 
requires xz Z PA |x| = 1A k < MAXR 
ensures P’=PUxAk’=k+1 


procedure start1(x) 
requires z Z RA |x| = 1 A|R| < MAXR 
ensures R’ =RUxAS’=S 


i { 
P:=PUza; 
R:=RUa; Reais 


} } 


Simulation relation r: 
r((R, S),(P,k)) = (P =RUSAk = |R)) 


Simulation relation conditions in BAPA: 
1. Va,R,S,P,k.(P =RUSAk= IR|) A (a Z PA |x| = 1 Ak < MAXR) > 
(c ZRA |e] =1A|R| < MAXR) 
2. ¥x,R,S,R’,S’,P,k.4P’,k’.((P =RUSAk=IR|) A(R’ =RUZAS'’=S)A 
(a ZPA |x| =1Ak < MAXR)) > 
(P’=PUaAk’ =k+1)A(P’=R’US' Ak’ = [R']) 
Fig. 7. Proving simulation relation in BAPA 


3.3. Proving Termination of Programs 


We next show how BAPA is useful for proving program termination. A standard tech- 
nique for proving termination of a loop is to introduce a ranking function f that maps 


var iter : set; 


. Ranking function: 
procedure iterate() f(s) = |s| 


while iter 4 § do 
var e: set; 
e := choose iter; 
iter := iter \ e; 


Transition relation: 
t(iter, iter’) = (Je. |e| = 1 Ae C iter A iter’ = iter \ e) 


Termination condition in BAPA: 
process(e); Viter.Viter’. (Se.{e] = 1 A e C iter A iter’ = iter \ e) 
done => |iter’| < iter| 


} 


Fig. 8. Terminating program 


Fig. 9. Termination proof for Figure 8 


program state into a non-negative integer, and the prove that the value of the func- 
tion decreases at each loop iteration. In other words, if t(s, s’) denotes the relation- 
ship between the state at the beginning and end of the procedure, then the condition 
Vs.Vs'.t(s,s’) = f(s) > f(s’) holds. Figure 8 shows an example program that pro- 
cesses each element of the initial value of set iter; this program can be viewed as ma- 
nipulating an iterator over a data structure that implements a set. Using the the ability to 
take cardinality of a set allows us to define a natural ranking function for this program. 
Figure 9 shows the termination proof based on such ranking function. Note that, because 
the loop contains a local variable, the resulting loop transition relation contains an ex- 
istential quantifier. The resulting termination condition can be expressed as a formula 
that belongs to BAPA, and can be discharged using our decision procedure. In general, 
we can reduce the termination problem of programs that manipulate both sets and in- 
tegers to showing a simulation relation with a fragments of a terminating program that 
manipulates only integers, which can be proved terminating using techniques [55-57]. 
The simulation relation condition can be proved correct using our BAPA decision pro- 
cedure whenever the simulation relation is expressible with a BAPA formula. 


4 Decision Procedure for BAPA 


This section presents our algorithm, denoted a, which reduces a BAPA sentence to an 
equivalent PA sentence with the same number of quantifier alternations and an expo- 
nential increase in the total size of the formula. This algorithm has several desirable 
properties: 


1. Given the space and time bounds for PA sentences [61], the algorithm a yields 
reasonable space and time bounds for deciding BAPA sentences (Section 5). 

2. The algorithm a does not eliminate integer variables, but instead produces an equiv- 
alent quantified PA sentence. The resulting PA sentence can therefore be decided 
using any decision procedure for PA, including the decision procedures based on 
automata [23, 31,44]. 

3. The algorithm a can eliminate set quantifiers from any extension of PA. We thus 
obtain a technique for adding a particular form of set reasoning to every extension 
of PA, and the technique preserves the decidability of the extension. One example 


of decidable theory that extends PA is MSOL over strings, see See Section 11 for 
the discussion. 

4. For simplicity we present the algorithm a as a decision procedure for formulas 
with no free variables, but the algorithm can be used to transform and simplify 
formulas with free variables as well, because it transforms one quantifier at a time 
starting from the innermost one. Because of this feature, we can use the algorithm 
a to project out local state components from formulas that describe invariants and 
transition relations, and simplify the resulting formulas. 


We next describe the algorithm a for transforming a BAPA sentence Fo into a PA 
sentence. As the first step of the algorithm, transform Fo into prenex form 
Qpup.---Qivr. F(v1,...,Up) (1) 


where F' is quantifier-free, and each quantifier Q;v; is of one the forms 4k, Vk, dy, Vy 
where & denotes an integer variable and y denotes a set variable. 

The next step of the algorithm is to separate F' into BA part and PA part. To achieve 
this, replace each formula x = y where «x and y are sets, with the conjunction x C 
y Ay C a, and replace each formula x C y with the equivalent formula |x M y°| = 0. 
In the resulting formula, each set x occurs in some term |t(a)|. Next, use the same 
reasoning as when generating disjunctive normal form for propositional logic to write 
each set expression ¢() as a union of cubes (regions in Venn diagram [74]) of the form 
ey x;* where x§' is either x; or xf; hence there are m = 2” cubes s1,...,5m. 
Suppose that t(2) = sj, U...8;,; then replace the term |t(2)| with the term 57"_, |sj, 
In the resulting formula, each set x appears in an expression of the form |s;| where s; is 
a cube. For each s; introduce a new variable /;. Then the resulting formula is equivalent 
to 


QpUp.---Q1U1. 


Sts 4 Ima e se Sp Cs Y 


where G, is a PA formula and m = 2”. Formula (2) is the starting point of the main 
phase of algorithm a. The main phase of the algorithm successively eliminates quanti- 
fiers Q1v1,-.-, QpUp while maintaining a formula of the form 


Qpvp - .. Qrur. 


FP lis los eee |sa| =]; A G, (3) 


where G;,. is a PA formula, r grows from 1 to p+ 1, and gq = 2° where e forO <e<n 
is the number of set variables among v,,..., Vy. The list s1,..., Sq is the list of all 2° 
partitions formed from the set variables among vy, ..., Ur. 

We next show how to eliminate the innermost quantifier Q,.v,. from the formula (3). 
During this process, the algorithm replaces the formula G,. with a formula G;.; which 
has more integer quantifiers. If v, is an integer variable then the number of sets q re- 
mains the same, and if v,. is a set variable, then g reduces from 2° to 2°—!. We next 
consider each of the four possibilities 4k, Vk, dy, Vy for the quantifier Q,-v;.. 

Consider first the case 3k. Because k does not occur in A\?_, |s;| = 1;, simply move 
the existential quantifier to G,. and let G,.., = 4k.G,, which completes the step. 


For universal quantifiers, observe that 
q 


tat igecdlge IN |ee| Slr AG,) 


i=1 


is equivalent to Jt, ...0g. A%. |si| =i A 7G,, because the existential quantifier is 
used as a let-binding, so we may first substitute all values |; into G,., then perform the 
negation, and then extract back the definitions of all values /;. Given that the universal 
quantifier Vk can be represented as a sequence of unary operators =3k-, from the elim- 
ination of 4/4 we immediately obtain the elimination of Vk; it turns out that it suffices 
to let G41 = Vk.G,. 

We next show how to eliminate an existential set quantifier Sy from 


q 
aya hase leh le 0G, (4) 
i=l 


which is equivalent to 3+], ... ly. (Ay. A¢_, |s:| = li) A Gy. This is the key step of 
the algorithm and relies on the following lemma, whose proof is in Section 9. 


Lemma 1. Let b),..., bn be finite disjoint sets, and l,,...,ln,k1,...,kn be natural 
numbers. Then the following two statements are equivalent: (1) There exists a finite set 
y such that \_, |bi Ny| = ki A [bi N y®| = L; and (2) Aj, |bi| = ki + Li. Moreover, 
the statement continues to hold if for any subset of indices i the conjunct |b; N y| = ki 
is replaced by |b; N y| > kj or the conjunct |b; M y°| = 1; is replaced by |b; Ny°| > li, 


provided that |b;| = k; + l; is replaced by |b;| > k; + l;, as indicated in Figure 10. 
original formula eliminated form 
Gy... Jonyl>kApbny|>t..| lo) >ktl 
ay... [WN yl =kAlbNy| >1...| fb >k41 
ay... [ON yl >RAlony|=l...| |b] >k41 
ay... [ON yl =kAlony|=l...| jbl =k41 


Fig. 10. Rules for Eliminating Quantifiers from Boolean Algebra Expressions 
In the quantifier elimination step, assume without loss of generality that the set variables 
$1,-.., Sq are numbered such that so; = s, M y° and sa; = si My for some cube s'. 
Then apply Lemma | and replace each pair of conjuncts 


|s;Ny®| = laa A |siNy| = lai 
with the conjunct |s/| = l2;-1 + la, yielding formula 


qd 
ath. golig. /\ \si| = loa tla A Gr (5) 


i=1 


for q’ = 2°—'. Finally, to obtain a formula of the form (3) for r + 1, introduce fresh 
variables 1’ constrained by 1/ = l2;-1 + l2;, rewrite (5) as 


>. 


q’ 
aih..ly. Alsi] =i A Al...dg. AG = bia thi A Gr) 
i=1 


i=l 


and let 


, 


Gro = Th... dg. A = bi-1 + las A Ge (6) 
i=l 


This completes the description of elimination of an existential set quantifier dy. 
To eliminate a set quantifier Vy, pee analogously: introduce fresh variables 1; = 
log—1 + lo; and let G4. = Vth... lge (AL, = lai-1 + lai) => G, which can be 

verified by expressing Vy as =dy-. 

After eliminating all quantifiers as described above, we obtain a formula of the form 
+1. |JU| = 1A Gp41 (I). We define the result of the algorithm, denoted a(F9), to be the 
PA sentence G',41(MAXC). 

This completes the description of the algorithm a. Given that the validity of PA 
sentences is decidable, the algorithm a is a decision procedure for BAPA sentences. 


Theorem 2. The algorithm a described above maps each BAPA-sentence Fo into an 
equivalent PA-sentence a( Fo). 


Formalization of the algorithm a. To formalize the algorithm a, we have imple- 
mented it the functional programming language O’Caml; Section 12 contains the source 
code of the implementation. As an illustration, when we run the implementation on the 
BAPA formula in Figure 6 which represents a verification condition, we immediately 
obtain the PA formula in Figure 11. Note that the structure of the resulting formula 
mimics the structure of the original formula: every set quantifier is replaced by the cor- 
responding block of quantifiers over non-negative integers constrained to partition the 
previously introduced integer variables. Figure 12 presents the correspondence between 
the set variables of the BAPA formula and the integer variables of the translated PA for- 
mula. Note that the relationship content’ = content U e translates into the conjunction 
of the constraints |content’ M (content U e)°| = 0 A |(content U e) MN content’“| = 0, 
which reduces to the conjunction lj99 = 0 A loi1 + loo1 + Jo19 = 0 using the transla- 
tion of set expressions into the disjoint union of partitions, and the correspondence in 
Figure 12. 

The subsequent sections explore the consequences of the existence of the algorithm 
a, including an upper bound on the computational complexity of BAPA sentences and 
the combination of BA with proper extensions of PA. We explain our experience with 
using the implementation in Section 6. 


5 Complexity 


In this section we analyze the algorithm a from Section 4 and obtain space and time 
bounds on BAPA from the corresponding space and time bounds for PA. We then show 
that the new decision procedure meets good worst-case space bounds for BA if applied 
to BA formulas. Moreover, by construction, our procedure reduces to the procedure for 
Presburger arithmetic formulas if there are no set quantifiers. In summary, our decision 
procedure is reasonable for BA, does not impose any overhead for pure PA formulas, 
and the complexity of the general BAPA validity has the same height of the tower of 
exponentials as the complexity of PA itself. 
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general relationship: 
Vth.Wtlo. MAXC =h, +10 > li. esi = [set A set? 1 A.A set 
Vl. Vtlor.Vthio.W* loo. ee eee 
by = tii +loi Alo = ho + loo > 
Whar. VW loi. V ti01- V™ loo - 


SS — number of set variables 


VW" lito. V" loio. WV" lioo. V* looo. in this example: 
bia = tia + lor A lor = bioi + loor A set: = content! 
lig = L110 + loin A loo = lioo + looo => set2 = content 
Vsize.Vsize’. set3 =e 
(lata + lors + hor + loo = 1 A looo = |content’© M content® M e°| 
ling tlou =OA loo1 = |content’© N content® N e| 
lita + lo11 + lito + loin = size A loio = content’ © () content e°| 
lioo =OA lo11 = |content’© M content N e| 
loi + loo1 + lo1o =O0OA lioo = |content’ M content® e*| 
size’ = size+1)> lio1 = |content’ M content® N e| 
(0 < size’ A li1o = |content’ M content N e°| 
lai +ho1 + lio + hoo = size’) lii1 = |content’ M content N e| 


Fig. 11. The translation of the BAPA sentence Fig.12. The Correspondence between In- 
from Figure 6 into a PA sentence teger Variables in Figure 11 and Set Vari- 
ables in Figure 6 


5.1 An Elementary Upper Bound 


We next show that the algorithm in Section 4 transforms a BAPA sentence Fo into a PA 
sentence whose size is at most one exponential larger and which has the same number 
of quantifier alternations. 

If F' is a formula in prenex form, let size(F’) denote the size of F,, and let alts(’) 
denote the number of quantifier alternations in F’. Define the iterated exponentiation 
function exp,(x) by expy(x) = x and exp,,,(x) = 2%P«(*), We have the following 
lemma. 


Lemma 3. For the algorithm a from Section 4 there is a constant c > 0 such that 
size(a(Fy)) < 2°57¢() and alts(a(Fo)) = alts(Fo). Moreover, the algorithm w runs 
in 20(8i2e(Fo)) snace. 


We next consider the worst-case space bound on BAPA. Recall first the following 
bound on space complexity for PA. 


Fact 1 [20, Chapter 3] The validity of a PA sentence of length n can be decided in 
space exp, (O(n)). 

From Lemma 3 and Fact | we conclude that the validity of BAPA formulas can be 
decided in space exp3(O(n)). It turns out, however, that we obtain better bounds on 
BAPA validity by analyzing the number of quantifier alternations in BA and BAPA 
formulas. 

Fact 2 [61] The validity of a PA sentence of length n and the number of quantifier 


‘ i P O(m) 
alternations m can be decided in space 2” ; 


From Lemma 3 and Fact 2 we obtain our space upper bound, which implies the upper 
bound on deterministic time. 
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Theorem 4. The validity of a BAPA sentence of length n and the number of quantifier 
alternations m can be decided in space exp,(O(mn)), and, consequently, in determin- 
istic time exp3(O(mn)). 


If we approximate quantifier alternations by formula size, we conclude that BAPA va- 
lidity can be decided in space exp3(O(n”)) compared to exp.(O(n)) bound for Pres- 
burger arithmetic from Fact |. Therefore, despite the exponential explosion in the size 
of the formula in the algorithm a, thanks to the same number of quantifier alternations, 
our bound is not very far from the bound for Presburger arithmetic. 


5.2. BA asa Special Case 


We next analyze the result of applying the algorithm a to a BA sentence Fo. By a 
BA sentence we mean a BA sentence without cardinality constraints, containing only 
the standard operations M, U, ° and the relations C, =. At first, it might seem that the 
algorithm a is not a reasonable approach to deciding BA formulas given that the best 
upper bounds for PA are worse than the corresponding bounds for BA. However, we 
identify a special form of PA sentences PApa = {a(Fo) | Fo isin BA} and show 
that such sentences can be decided in 20”) space, which is good for BA [32]. Our 
analysis shows that using binary representations of integers that correspond to the sizes 
of sets achieves a similar effect to representing these sets as bitvectors, although the two 
representations are not identical. 

Let S' be the number of set variables in the initial formula Fo (recall that set variables 
are the only variables in Fo). Let 1),...,J, be the set of free variables of the formula 
G(l1,...,q); then g = 2° fore = S$ +1 —r. Let wi,..., wa be integers specifying 
the values of /;,...,Jq. We then have the following lemma. 


Lemma 5. For each r where 1 <r < S the truth value of G,(wi,... Wg) is equal to 
the the truth value of G-(W1,...,Wq) where ; = min(w;, 2"~'). 


Now consider a formula Fo of size n with S' free variables. Then a(Fo) = G41. 
By Lemma 3, size(a(F)) is O(n$2°). By Lemma 5, it suffices for the outermost vari- 
able k to range over the integer interval [0, 2°], and the range of subsequent variables 
is even smaller. Therefore, the value of each of the 25+! — 1 variables can be repre- 
sented in O(S') space, which is the same order of space used to represent the names 
of variables themselves. This means that evaluating the formula a(Fo) can be done in 
the same space O(nS2°) as the size of the formula. Representing the valuation assign- 
ing values to variables can be done in O(,92°) space, so the truth value of the formula 
can be evaluated in O(nS2°) space, which is certainly 2°(), We obtain the following 
theorem. 


Theorem 6. /f Fo is a pure BA formula with S variables and of size n, then the truth 
value of a(Bo) can be computed in O(n$2°) and therefore 2°“) space. 


6 Experience Using Our Decision Procedure for BAPA 


We have experimented with BAPA in the context of Jahob system [34] for verifying data 
structure consistency of Java programs. Jahob parses Java source code annotated with 
formulas in Isabelle syntax written in comments, generates verification conditions, and 
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uses decision procedures and theorem provers to discharge these verification conditions. 
Jahob currently contains interfaces to the Isabelle interactive theorem prover [51], the 
Simplify theorem prover [17] as well as the Omega Calculator [60] and the LASH [44] 
decision procedures for PA. 

Using Jahob, we have generated verification conditions for several Java program 
fragments that require reasoning about sets and their cardinalities, for example proving 
the equality relation between the number of elements in a list and the integer field size 
after they have been updated. Formulas arising from examples in Section 3 have also 
been discharged using our current implementation. We have found that Simplify is able 
to deal with some of the formulas involving only sets or only integers, but not with 
formulas that relate cardinalities of operations on sets to cardinalities of the individual 
sets. These formulas can be proved in Isabelle, but require user interaction in terms 
of auxiliary lemmas. On the other hand, our implementation of the decision procedure 
automatically discharges these formulas. 

Our current implementation makes use of some transformations and simplifications 
to reduce formula sizes. We find that eliminating set variables early by substitution is 
a highly effective optimization. When using Omega Calculator as the backend for our 
system, we also observed that lifting quantifiers to the top level noticeably improve 
performance. These transformations effectively extend the range of formulas that the 
current system can handle. Our current implementation of the decision procedure and 
example formulas can be found on the website [33]. 


7 Related Work 


Our paper is the first result that shows a complexity bound for the first-order theory 
of BAPA. The decidability for BAPA, presented as BA with equicardinality constraints 
was presented in [19] (see Section 2). A decision procedure for a special case of BAPA 
was presented in [79], which allows only quantification over elements but not over sets 
of elements. BAPA is a more general language because singleton sets can represent 
elements, so quantification over sets allows modelling quantification over elements. [62] 
(which appeared after [38]) shows the decidability of BA with constant cardinalities. 


Presburger arithmetic. The original result on decidability of PA is [59]. The best 
known bound on formula size is [20]. This decision procedure was improved in [16] 
and subsequently in [52]. An analysis based on the number of quantifier alternations is 
presented in [61]. Our implementation uses quantifer-elimination based Omega test [60] 
which, in our current experience, outperforms other implementations we have tried. 
Among the decision procedures for full PA, [13] is the only proof-generating version, 
and is based on a version of [16]. Decidable fragments of arithmetic that go beyond PA 
include MSOL over strings [11,31] and [9]. 


Boolean Algebras. The first results on decidability of BA are from [45], [1, Chap- 
ter 4] and use quantifier elimination, from which one can derive small model prop- 
erty; [32] gives the complexity of the satisfiability problem. [48] studies unification in 
Boolean rings. The quantifier-free fragment of BA is shown NP-complete in [47]; see 
[39] for a generalization of this result using parameterized complexity of the Bernays- 
Schonfinkel-Ramsey class of first-order logic [8, Page 258] which can be decided us- 
ing [24] or [7]. [12] gives an overview of several fragments of set theory including 
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theories with quantifiers but no cardinality constraints and theories with cardinality 
constraints but no quantification over sets. Quantifier-free formulas are also used in 
constraint solving [2,6, 15]. Among the systems for interactively reasoning about richer 
theories of sets are Isabelle [51], HOL [26], PVS [53], TPS [3]; first-order frameworks 
such as Athena [4] can use axiomatizations of sets along with calls to resolution-based 
theorem provers such as Vampire [75] to reason about sets. 


Combinations of Decidable Theories. The techniques for combining quantifier-free 
theories [50,63] and their generalizations such as [71—73,77,78] are of great importance 
for program verification. Our paper shows a particular combination result for quantified 
formulas, which add additional expressive power in writing specifications. Among the 
general results for quantified formulas are the Feferman-Vaught theorem for products 
[19] and term powers [36, 37]. While we have found quantifiers to be useful in several 
contexts, many problems can be encoded in quantifier-free formulas, so it is interesting 
to consider a combination of BAPA with solvers for quantifier-free formulas [21,25,69]. 
Description logics [5] and two-variable logic with counting [27,54,58] support sets and 
cardinalities, and additionally support relations, but do not allow quantification over 
sets. 


Analyses of Dynamic Data Structures. In addition to the new technical results, one 
of the contributions of our paper is to identify the uses of our decision procedure for 
verifying data structure consistency. We have shown how BAPA enables the verifica- 
tion tools to reason about sets and their sizes. This capability is particularly important 
for analyses that handle dynamically allocated data structures where the number of ob- 
jects is statically unbounded [35, 49, 65, 76]. Recently, these approaches were extended 
to handle the combinations of the constraints representing data structure contents and 
constraints representing numerical properties of data structures [14,64]. Our result pro- 
vides a systematic mechanism for building precise and predictable versions of such 
analyses. Among other constraints used for data structure analysis, BAPA is unique 
in being a complete algorithm for an expressive theory that supports arbitrary quanti- 
fiers. As we have illustrated in Section 3, the use of quantifiers is important for proving 
verification conditions that include quantified annotations, for computing abstractions 
of program fragments that involve local variables, and for proving simulation relation 
conditions. We has also illustrated the use of BAPA for reasoning about termination of 
programs that manipulate dynamic data structures by associating integer variables with 
sizes of sets that specify the objects in data structures and using techniques for proving 
termination of programs with integers [55-57]. Other possible applications of our deci- 
sion procedure include query evaluation in constraint databases [62] and loop invariant 
inference [30]. 


8 Conclusion 


Motivated by static analysis and verification of relations between data structure content 
and size, we have presented an algorithm for deciding the first-order theory of Boolean 
algebras with Presburger arithmetic (BAPA), showed an elementary upper bound on the 
worst-case complexity, implemented the algorithm and applied it to several reasoning 
tasks. Our experience indicates that the algorithm will be useful as a component of a 
decision procedure of our data structure verification system. 
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APPENDIX 


9 Proofs of Lemmas 


Lemma 1 Let b;,...,b,, be finite disjoint sets, and |,,...,ln,k1,...,kn be natural 
numbers. Then the following two statements are equivalent: (1) There exists a finite set 
y such that \_, |bi N-y| = ki A [bi N y®| = U; and (2) Ai, |bi| = ki + Li. Moreover, 
the statement continues to hold if for any subset of indices i the conjunct |b; Ny| = ki 
is replaced by |b; N y| > kj or the conjunct |b; M y°| = 1; is replaced by |b; Ny°| > li, 
provided that |b;| = k; + l; is replaced by |b;| > k; + l;, as indicated in Figure 10. 


Proof. (=) Suppose that there exists a set y satisfying (1). Because b; N y and b; M y° 
are disjoint, |b;| = |b; M y| + |b; N y|, so |b;| = k; + 1; when the conjuncts are 
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lb; Ny| = k& A |b; N y®| = ki, and |b;| > &; + 1; if any of the original conjuncts have 
inequality. 

(<) Suppose that (2) holds. First consider the case of equalities. Suppose that |b;| = 
k; + 1; for each of the pairwise disjoint sets b1,...,6,. For each b; choose a subset 
yi C b; such that |y;| = kj. Because |b;| = k; + 1;, we have |b; N yf| = U;. Having 
chosen y1,..., Yn, let y = Uj_, yi. For i # j we have b; Ny; = Mand b; N yf = bj, so 
by = y and b; Ny° = b; Myf. By the choice of y;, we conclude that y is the desired 
set for which (1) holds. The case of inequalities is analogous: for example, in the case 
|b; Ny| > ky A [BEN yS| = Li, choose y; C b; such that |y;| = |b;| — :. 


Lemma 3. For the algorithm a from Section 4 there is a constant c > 0 such that 
size(a(Fy)) < 2°57°0) and alts(a(Fo)) = alts(Fo). Moreover, the algorithm a runs 
in 20(8i2e(Fo)) space. 


Proof. To gain some intuition on the size of a( Fo) compared to the size of Fp, compare 
first the formula in Figure 11 with the original formula in Figure 6. Let n denote the 
size of the initial formula Fo and let S' be the number of set variables. Note that the 
following operations are polynomially bounded in time and space: 1) transforming a 
formula into prenex form, 2) transforming relations b; = bz and b; C bg into the 
form |b| = 0. Introducing set variables for each partition and replacing each |b| with 
a sum of integer variables yields formula G; whose size is bounded by O(n2°S) (the 
last S factor is because representing a variable from the set of K variables requires 
space log kK’). The subsequent transformations introduce the existing integer quantifiers, 
whose size is bounded by n, and introduce additionally 25-1 +... +2+1=25-1 
new integer variables along with the equations that define them. Note that the defining 
equations always have the form Ji = Ip;-1 + Ig; and have size bounded by S. We 
therefore conclude that the size of a( Fo) is O(nS(2° + 2°)) and therefore O(nS2°), 
which is certainly O(2") for any c > 1. Moreover, note that we have obtained a more 
precise bound O(n$2°) indicating that the exponential explosion is caused only by set 
variables. Finally, the fact that the number of quantifier alternations is the same in Fo 
and a(Fo) is immediate because the algorithm replaces one set quantifier with a block 
of corresponding integer quantifiers. 


Lemma 5. For each r where 1 <r < S the truth value of G,(wi,..., Wq) is equal to 
the the truth value of G-(W1,..., Wq) where tw; = min(w;, 2"~'). 


Proof. We prove the claim by induction. For r = 1, observe that the translation of a 
quantifier-free part of the pure BA formula yields a PA formula F), whose all atomic 
formulas are of the form J;, +...+ 1;, = 0, which are equivalent to Ve Li, = 0. 
Therefore, the truth-value of Ff, depends only on whether the integer variables are zero 
or non-zero, which means that we may restrict the variables to interval [0, 1]. 

For the inductive step, consider the elimination of a set variable, and assume that 
the property holds for G,. and for all q tuples of non-negative integers w1,..., Wg. 
Let q’ = q/2 and wi{,...,w/, be a tuple of non-negative integers. We show that 
Gr4i(W1,..-, Wir) is equivalent to G41 (w),..., Wy). 

Suppose first that G41 (%},... Wy) holds. Then for each wi there are w2;_ and 
wa; such that wi = uaj—1 + ug; and G,(u1,..., Ug). We define witnesses wi,..., Wg 
as follows. If wi < 2”, then let w2;-1 = uai_1 and wa; = ua;. If wi > 2” then either 
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U2i-1 > 2" or U2, > orat (or both). If wa;-, > hae then let wa;-, = wi — U24 
and wo; = wU9;. Note that G,(..., Wai-1,---) —= G,(...,U2i-1,---) —= 
G,.(...,27~1,...) by induction hypothesis because both ug;_1 > 27~! and wa;_-1 > 
2°-1. For wy,..., Wg chosen as above we therefore have wh = waj—-1 + wa; and 
G-(w1,...,Wq), which by definition of G41 means that G,41(w{,... ,Wiy) holds. 
Conversely, suppose that G-+1(w},..., wi) holds. Then there are w1,..., wg such 
that G-(wi,...,Wq) and wi = waji-1 + wa;. If wai-1 < 2771 and wa; < wo; then 
wi < 2” so let wa;-1 = wWwoj_1 and ua; = wa;. If waj-1 > 2"—! and WwW. > Wi 
then let ua; = 271 and ua; = 2771. If wo;_1 > 27-1 and wo; < 2771 then let 


Uoi-1 = 2" — wa; and ua; = we;. By induction hypothesis we have G,(u1,...,Uq) = 
G,(w1,...,Wq)- Furthermore, ua;-1 + v2; = Wi, so G-41(Wj,... , Wi) by definition 
of Gr41. 


10 BAPA with Potentially Infinite Sets 


We next sketch the extension of our algorithm a (Section 4) to the case when the uni- 
verse of the structure may be infinite, and the underlying language has the ability to 
distinguish between finite and infinite sets. Infinite sets are useful in program analy- 
sis for modelling pools of objects such as those arising in dynamic object allocation. 
This section presents an approach that avoids directly reasoning about cardinalities of 
infinite sets and thus remains within the language of PA. (As was observed in [19], an 
alternative is to use a generalization of PA that admits infinite cardinals.) 

We generalize the language of BAPA and the interpretation of BAPA operations as 
follows. 


1. Introduce unary predicate fin(b) which is true iff b is a finite set. The predicate 
fin(b) allows us to generalize our algorithm to the case of infinite universe, and 
additionally gives the expressive power to distinguish between finite and infinite 
sets. For example, using fin(b) we can express bounded quantification over finite or 
over infinite sets. 

2. Define |b| to be the integer zero if b is infinite, and the cardinality of 6 if b is finite. 

3. Introduce propositional variables denoted by letters such as p,q, and quantifica- 
tion over propositional variables. Extend also the underlying PA formulas with 
propositional variables, which is acceptable because a variable p can be treated 
as a shorthand for an integer from {0,1} if each use of p as an atomic formula is 
interpreted as the atomic formula (p = 1). Our extended algorithm uses the equiv- 
alences fin(b) <p to represent the finiteness of sets just as it uses the equations 
|b| = 1 to represent the cardinalities of finite sets. 

4. Introduce a propositional constant FINU such that fin(?/) <= FINU. This proposi- 
tional constant enables equivalence preserving quantifier elimination over the set 
of models that includes both models with finite universe (/ and the models with 
infinite universe U/. 

Denote the resulting extended language BAPA™. 
The following lemma generalizes Lemma | for the case of equalities. 


Lemma 7. Let b,...,0,, be disjoint sets, l,,...,ln,k1,...,kn be natural numbers, 
and p1,---,;Pn;Q1;--+)Qn be propositional values. Then the following two statements 
are equivalent: 
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I. There exists a set y such that 


/\ |b Ny| = ki A (fin(bs Dy) & pi) A (7) 
bi Ny®| = Ui A (Fin(bi Ny“) <= Gi) 


A (pi Ng => |bi] = ki +L) A (8) 
i=1 (fin(bi) (pi A qi) 


Proof. (=>) Suppose that there exists a set y satisfying (7). From b; = (b;:Ny)U(biNy*), 
we have fin(b;) =(p; A q;). Furthermore, if p; and g; hold, then both b; 9 y and b; N y° 
are finite so the relation |b;| = |b; N y| + |b; N y°| holds. 

(<=) Suppose that (8) holds. For each i we choose a subset y; C b;, depending on 
the truth values of p; and q;, as follows. 


1. If both p; and q; are true, then fin(b;) holds, so 6; is finite. Choose y; as any subset 
of b; with k; elements, which is possible since b; has k; + 1; elements. 

2. If p; does not hold, but g; holds, then fin(b;) does not hold, so b; is infinite. Choose 
yj, as any finite set with 1; elements and let y; = b; \ y; be the corresponding cofinite 
set. 

3. Analogously, if p; holds, but g; does not hold, then 0; is infinite; choose y; as any 
finite subset of b; with k; elements. 

4. If p; and q; are both false, then 6; is also infinite; every infinite set can be written 
as a disjoint union of two infinite sets, so let y; be one such set. 


Let y = U;_, yi. As in the proof of Lemma 1, we have b; Ny = y; and bi Ny® = yf. 
By construction of yi, ..., Yn we conclude that (7) holds. 


The algorithm a for BAPA® is analogous to the algorithm for BAPA. In each step, 
the new algorithm maintains a formula of the form 


QpUp ---Qrur- 
Ath... lg. dpi... pq. 
(Afar |si| = Ui A (fin(si)  pi)) A Gr 


As in Section 4, the algorithm eliminates an integer quantifier 4k by letting G,.1 = 
dk.G, and eliminates an integer quantifier VA by letting G41, = Vk.G,. Furthermore, 
just as the algorithm in Section 4 uses Lemma | to reduce a set quantifier to integer 
quantifiers, the new algorithm uses Lemma 7 for this purpose. The algorithm replaces 


dy. 3+ ...lg. Spr. 
Ai, |sil = li A Caer A Gr 


with 


Ww 


ne eee ees: eens oe 


(AZ, Ish] =U A (fin(s4) ph) A Gra 


for q’ = q/2, and 
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Gro. = Ith... lg. Spi,---, Pa: 
oe (poi—1 A por => Ui = lai-1 + lai) A 
(vi (pai-1 A p2i))) 
A Gr 


For the quantifier Vy the algorithm analogously generates 
Gri = Wh . wg. Vp1, s++5Dq- 
1, (pai-1 A pai => Uj, = lai-1 + lai) A 
(pi (pais A p2i))) 


=> G,; 


After eliminating all quantifiers, the algorithm obtains a formula of the form 
AtL.Ap. |U| = LA (fin) =p) A Gp4i(l, p). We define the result of the algorithm 
to be the PA sentence G’,1(MAXC, FINU). 

This completes our description of the generalized algorithm a for BAPA®. The 
complexity analysis from Section 5 also applies to the generalized version. We also 
note that our algorithm yields an equivalent formula over any family of models. A sen- 
tence is valid in a set of models iff it is valid on each model. Therefore, the validity 
of a BAPA® sentence Fo is given by applying to the formula a(Fp)(MAXC, FINU) 
a form of universal quantifier over all pairs (MAXC, FINU) that determine the char- 
acteristics of the models in question. For example, for the validity over the models 
with infinite universe we use a(Fo)(0, false), for validity over all finite models we use 
Vk.a(Fo)(k, true), and for the validity over all models we use the PA formula 


a(Fo)(0, false) \ Vk.a(£o)(k, true). 


We therefore have the following result. 


Theorem 8. The algorithm above effectively reduces the validity of BAPA™ sentences 
to the validity of Presburger arithmetic formulas with the same number of quantifier al- 
ternations, and the increase in formula size exponential in the number of set variables; 
the reduction works for each of the following: 1) the set of all models, 2) the set of 
models with infinite universe only, and 3) the set of all models with finite universe. 


11 Relationship with MSOL over Strings 


The monadic second-order logic (MSOL) over strings is a decidable logic that can 
encode Presburger arithmetic by encoding addition using one successor symbol and 
quantification over sets. This logic therefore simultaneously supports sets and integers, 
so it is natural to examine its relationship with BAPA. It turns out that there are two 
important differences between MSOL over strings and BAPA: 
1. BAPA can express relationships of the form |A| = & where A is a set variable and 
k is an integer variable; such relation is not definable in MSOL over strings. 
2. In MSOL over strings, the sets contain binary digits of an integer whereas in BAPA 
the sets contain uninterpreted elements. 
Given these differences, a natural question is to consider the decidability of an extension 
of MSOL that allows stating relations |A| = k where A is a set of digits and k is an in- 
teger variable. Note that by saying 3k.|A| = k A|B| = k we can express | A| = |B, so 
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we obtain MSOL with equicardinality constraints. However, extensions of MSOL over 
strings with equicardinality constraints are known to be undecidable; we review some 
reductions in Section 11.2. Undecidability results such as these are what perhaps led to 
the conjectures that BAPA itself is undecidable [79, Page 12]. In this paper we pointed 
out that BAPA is, in fact, decidable and proved that it has an elementary decision pro- 
cedure. Moreover, we present a combination of BA with MSOL over n-successors that 
is still decidable. 


11.1 Decidability of MSOL with Cardinalities on Uninterpreted Sets 


We next note that our algorithm also applies to monadic second-order logic of one 
successors, which is more expressive than PA itself [70, Page 400], [10]. 

Consider the multisorted language BAMSOL defined as follows. First, BAMSOL 
contains all relations of monadic second-order logic of one successors, whose variables 
range over strings over an n-ary alphabet and sets of such strings. Second, BAMSOL 
contains sets of uninterpreted elements and boolean algebra operations on them. Third, 
BAMSOL allows stating relationships of the form |x| = k where « is a set of un- 
interpreted elements and k is a string representing a natural number. Because all PA 
operations are definable in MSOL of 1-successor, the algorithm a applies in this case 
as well. Indeed, the algorithm a only needs a “lower bound” on the expressive power 
of the theory of integers that BA is combined with: the ability to state constraints of 
the form Ii = Iz;-1 + la;, and quantification over integers. Therefore, applying a to a 
BAMSOL formula results in an MSOL formula. This shows that BAMSOL is decidable 
and can be decided using a combination of algorithm a and a tool such as [31]. By 
Lemma 3, the decision procedure for BAMSOL based on translation to MSOL has up- 
per bound of exp,, (O(n)) using a decision procedure such as [31]. The corresponding 
non-elementary lower bound follows from the lower bound on MSOL itself [68]. 


11.2 Undecidability of MSOL of Integer Sets with Cardinalities 


We first note that there is a reduction from the Post Correspondence Problem that shows 
the undecidability of MSOL with equicardinality constraints. Namely, we can repre- 
sent binary strings by finite sets of natural numbers. In this encoding, given a position, 
MSOL itself can easily express the local property that, at a given position, a string con- 
tains a given finite substring. The equicardinality gives the additional ability of finding 
an n-th element of an increasing sequence of elements. To encode a PCP instance, it 
suffices to write a formula checking the existence of a string (represented as set A) and 
the existence of two increasing sequences of equal length (represented by sets U and 
D), such that for each i, there exists a pair (a;,5;) of PCP instance such that the posi- 
tion starting at U; contains the constant string a;, and Uj; = Uj; + la; |, and similarly 
the position starting at D; contains 6; and Dj41 = D; + |bj|. 
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12 O’Caml source code of algorithm a 


(* datatype of formulas *) 
type ident = string 


type binder = Forallset | Existsset 

| Forallint | Existsint | Forallnat | Existsnat 
type form = 
Not of form 
And of form list | Or of form list | Impl of form * form 
Bind of binder * ident * form 
Inteq of intTerm * intTerm | Less of intTerm * intTerm 
Seteq of setTerm * setTerm | Subseteq of setTerm * setTerm 
and intTerm = 
Intvar of ident | Const of int 
Plus of intTerm list | Minus of intTerm * intTerm | Times of int * intTerm 
Card of setTerm 
and setTerm = 
Setvar of ident | Emptyset | Fullset | Complement of setTerm 
Union of setTerm list | Inter of setTerm list 


let maxcard = "MAXC" 

(* i erg a toy rein Cert ar arty gee Ps eran a ay ea alr Se ce ae Se a eee *) 

(* algorithm \alpha *) 
(* replace Seteq and Subseteq with Card(...)=0 *) 

let simplify_set_relations (f:form) : form = 


et rec sform f = match f with 

Not f -> Not (sform f) 

And fs -> And (List.map sform fs) 

Or fs -> Or (List.map sform fs) 

Imp1(f1,f£2) -> Impl(sform f1,sform f2) 

Bind(b,id,f1) -> Bind(b,id,sform f1) 

Less(it1,it2) -> Less(itl, it2) 

Inteq(it1l,it2) -> Inteq(it1,it2) 

Seteq(stl,st2) -> And[sform (Subseteq(st1,st2))j; 
sform (Subseteq(st2,st1))] 


Subseteq(st1l,st2) -> Inteq(Card(Inter[st1l;Complement st2]),Const 0) 
in sform f 


(* split f£ into quantifier sequence and body *) 
let split_quants_body f = 
et rec vl f acc = match f with 
Bind(b,id,f1) -> vl f1 ((b,id)::acc) 
f -> (acc, f) 
tasvb-£ f] 


(* extract set variables from quantifier sequence *) 
let extract_set_vars quants = 
List.map (fun (b,id) -> id) 


(List.filter (fun (b,id) -> (b=Forallset || b = Existsset)) 
quants) 
type partition = (ident * setTerm) list 


(* make canonical name for integer variable naming a cube *) 
let make_name sts = 
let rec mk sts = match sts with 


| {}->"" 

| (Setvar _)::sts1l -> "1" * mk stsl 

| (Complement (Setvar _))::stsl -> "0" * mk stsl 

| _ -> failwith "make_name: unexpected partition form" 
in "1_" * mk sts 


(* make all cubes over vs *) 
let generate_partition (vs : ident list) : partition = 
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let add id ss = (Setvar id)::ss in 
let addc id ss = Complement (Setvar id)::ss in 
let add_set id inters = 

List.map (add id) inters @ 

List.map (addc id) inters in 
let mk_nm is = (make_name is, Inter is) in 
List.map mk_nm 

(List.map List.rev 

(List.fold_right add_set vs [[]])) 


(* is the set term true in the set valuation 
-- reduces to propositional reasoning *) 

let istrue (st:setTerm) (id,ivaluation) : bool = 
et valuation = match ivaluation with 

Inter v -> v 

_ -> failwith "wrong valuation" in 
et lookup v = 

if List.mem (Setvar v) valuation then true 

else if List.mem (Complement (Setvar v)) valuation then false 

else failwith "istrue: unbound var in valuation" in 
et rec check st = match st with 

Setvar v -> lookup v 

Emptyset -> false 

Fullset -> true 

Complement stl -> not (check st1) 

Union sts -> List.fold_right (fun stl t -> check st1 || t) sts false 
Inter sts -> List.fold_right (fun stl t -> check stl && t) sts true 
in check st 


(* compute cardinality of set expression 
as a sum of cardinalities of cubes *) 
let get_sum (p:partition) (st:setTerm) : intTerm list = 
let get_list (id,inter) = match inter with 
| Inter ss -> ss 
| _ -> failwith "failed inv in get_sum" 
in 
List.map (fun (id,inter) -> Intvar id) 
(List.filter (istrue st) p) 


(* replace cardinalities of sets with sums of 
variables denoting cube cardinalities *) 
let replace_cards (p:partition) (f:form) : form = 
et rec repl f = match f with 
Not f -> Not (repl f) 
And fs -> And (List.map repl fs) 
Or fs -> Or (List.map repl fs) 
Impl(f1,f£2) -> Impl(repl fl,repl f2) 
Bind(b,id,f1) -> Bind(b,id,repl f1) 
Less(it1,it2) -> Less(irepl itl,irepl it2) 
Inteq(it1l,it2) -> Inteq(irepl itl,irepl it2) 


Seteq(_,_) |Subseteq(_,_) -> failwith "failed inv in replace_cards" 
and irepl it = match it with 

Intvar _ -> it 

Const _ -> it 


Plus its -> Plus (List.map irepl its) 

Minus (it1l,it2) -> Minus(irepl itl, irepl it2) 
Times(k,itl) -> Times(k, irepl itl) 

Card st -> Plus (get_sum p st) 

in repl f 


let apply_quants quants f = 
List.fold_right (fun (b,id) f -> Bind(b,id,f)) quants f 


let make_defining_eqns id part = 
let rec mk ps = match ps with 


| tl > (1 
| (idl, Inter (stl::stsl)) :: (id2,Inter (st2::sts2)) :: psl 
when (stl=Setvar id && st2=Complement (Setvar id) && stsl=sts2) -> 
(Inter stsl,make_name stsl,idl,id2) :: mk psl 
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| _ -> failwith "make_triples: unexpected partition form" in 
let rename_last lss = match lss with 

| [(s,1,11,12)] -> [(s,maxcard,11,12)] 

| _ -> Iss in 

rename_last (mk part) 


(* -------------------------- *) 


(* main loop of the algorithm *) 


let rec eliminate_all quants part gr = match quants with 
[] -> gr 
(Existsint,id)::quantsl -> 

eliminate_all quantsl part (Bind(Existsint,id,gr) ) 
(Forallint,id)::quantsl -> 

eliminate_all quants1 part (Bind(Forallint,id,gr) ) 
(Existsnat,id)::quantsl -> 

eliminate_all quants1 part (Bind(Existsnat,id,gr) ) 
(Forallnat,id)::quantsl -> 

eliminate_all quantsl1 part (Bind(Forallnat,id,gr) ) 
(Existsset,id)::quantsl -> 
et eqns = make_defining_eqns id part in 
et newpart = List.map (fun (s,1l’,_,_) -> (1’,s)) eqns in 
et mk_conj (_,1’,11,12) = Inteq(Intvar 1’,Plus[Intvar 11;Intvar 12]) in 
et conjs = List.map mk_conj eqns in 
et lquants = List.map (fun (1,_) -> (Existsnat,1l)) part in 
et grl = apply_quants lquants (And (conjs @ [gr])) in 
eliminate_all quantsl newpart grl 
| (Forallset,id)::quantsl -> 
et eqns = make_defining_eqns id part in 


et newpart = List.map (fun (s,l’,_,_) -> (1’,s)) eqns in 

et mk_conj (_,1’,11,12) = Inteq(Intvar 1’,Plus[Intvar 11;Intvar 12]) in 
et conjs = List.map mk_conj eqns in 

et lquants = List.map (fun (1,_) -> (Forallnat,1l)) part in 


et grl = apply_quants lquants (Impl(And conjs, gr)) in 
eliminate_all quantsl newpart grl 


(* putting everything together *) 


let alpha (f:form) : form = 
(* assumes f in prenex form *) 


let (quants,fm) = split_quants_body f in 
let fml = simplify_set_relations fm in 
let setvars = List.rev (extract_set_vars quants) in 


let part = generate_partition setvars in 
let gl = replace_cards part fml in 
eliminate_all quants part gl 
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